The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Single-frame InfraRed Small Target (SIRST) detection has been a challenging task due to a lack of inherent characteristics, imprecise bounding box regression, a scarcity of real-world datasets, and sensitive localization evaluation. In this paper, we propose a comprehensive solution to these challenges. First, we find that the existing anchor-free label assignment method is prone to mislabeling small targets as background, leading to their omission by detectors. To overcome this issue, we propose an all-scale pseudo-box-based label assignment scheme that relaxes the constraints on scale and decouples the spatial assignment from the size of the ground-truth target. Second, motivated by the structured prior of feature pyramids, we introduce the one-stage cascade refinement network (OSCAR), which uses the high-level head as soft proposals for the low-level refinement head. This allows OSCAR to process the same target in a cascade coarse-to-fine manner. Finally, we present a new research benchmark for infrared small target detection, consisting of the SIRST-V2 dataset of real-world, high-resolution single-frame targets, the normalized contrast evaluation metric, and the DeepInfrared toolkit for detection. We conduct extensive ablation studies to evaluate the components of OSCAR and compare its performance to state-of-the-art model-driven and data-driven methods on the SIRST-V2 benchmark. Our results demonstrate that a top-down cascade refinement framework can improve the accuracy of infrared small target detection without sacrificing efficiency. The DeepInfrared toolkit, dataset, and trained models are available at https://github.com/YimianDai/open-deepinfrared to advance further research in this field.
translated by 谷歌翻译
Recognition of facial expression is a challenge when it comes to computer vision. The primary reasons are class imbalance due to data collection and uncertainty due to inherent noise such as fuzzy facial expressions and inconsistent labels. However, current research has focused either on the problem of class imbalance or on the problem of uncertainty, ignoring the intersection of how to address these two problems. Therefore, in this paper, we propose a framework based on Resnet and Attention to solve the above problems. We design weight for each class. Through the penalty mechanism, our model will pay more attention to the learning of small samples during training, and the resulting decrease in model accuracy can be improved by a Convolutional Block Attention Module (CBAM). Meanwhile, our backbone network will also learn an uncertain feature for each sample. By mixing uncertain features between samples, the model can better learn those features that can be used for classification, thus suppressing uncertainty. Experiments show that our method surpasses most basic methods in terms of accuracy on facial expression data sets (e.g., AffectNet, RAF-DB), and it also solves the problem of class imbalance well.
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
Camouflaged object detection (COD) aims to detect/segment camouflaged objects embedded in the environment, which has attracted increasing attention over the past decades. Although several COD methods have been developed, they still suffer from unsatisfactory performance due to the intrinsic similarities between the foreground objects and background surroundings. In this paper, we propose a novel Feature Aggregation and Propagation Network (FAP-Net) for camouflaged object detection. Specifically, we propose a Boundary Guidance Module (BGM) to explicitly model the boundary characteristic, which can provide boundary-enhanced features to boost the COD performance. To capture the scale variations of the camouflaged objects, we propose a Multi-scale Feature Aggregation Module (MFAM) to characterize the multi-scale information from each layer and obtain the aggregated feature representations. Furthermore, we propose a Cross-level Fusion and Propagation Module (CFPM). In the CFPM, the feature fusion part can effectively integrate the features from adjacent layers to exploit the cross-level correlations, and the feature propagation part can transmit valuable context information from the encoder to the decoder network via a gate unit. Finally, we formulate a unified and end-to-end trainable framework where cross-level features can be effectively fused and propagated for capturing rich context information. Extensive experiments on three benchmark camouflaged datasets demonstrate that our FAP-Net outperforms other state-of-the-art COD models. Moreover, our model can be extended to the polyp segmentation task, and the comparison results further validate the effectiveness of the proposed model in segmenting polyps. The source code and results will be released at https://github.com/taozh2017/FAPNet.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it easy to use a single model for both multilingual ASR and many-to-many ST. In this paper, we propose streaming language-agnostic multilingual speech recognition and translation using neural transducers (LAMASSU). To enable multilingual text generation in LAMASSU, we conduct a systematic comparison between specified and unified prediction and joint networks. We leverage a language-agnostic multilingual encoder that substantially outperforms shared encoders. To enhance LAMASSU, we propose to feed target LID to encoders. We also apply connectionist temporal classification regularization to transducer training. Experimental results show that LAMASSU not only drastically reduces the model size but also outperforms monolingual ASR and bilingual ST models.
translated by 谷歌翻译
在这项工作中,我们研究了基于价值的深钢筋学习(DRL)中简单但普遍适用的奖励成型案例。我们表明,线性转换形式的奖励转移等同于更改函数近似中$ q $ function的初始化。基于这样的等价性,我们带来了关键的见解,即积极的奖励转移会导致保守的剥削,而负面的奖励转移会导致好奇心驱动的探索。因此,保守的剥削改善了离线RL价值估计,乐观的价值估计改善了在线RL的勘探。我们验证了对一系列RL任务的见解,并显示了其对基准的改进:(1)在离线RL中,保守的剥削可根据现成的算法提高性能; (2)在在线连续控制中,具有不同转移常数的多个值函数可用于应对探索 - 诠释困境,以提高样品效率; (3)在离散控制任务中,负奖励转移可以改善基于好奇心的探索方法。
translated by 谷歌翻译
在本文中,我们研究了基于骨架的动作识别的问题,该问题在学习从基础阶级到新颖类的可转移表示方面构成了独特的挑战,尤其是针对细粒度的动作。现有的元学习框架通常依赖于空间维度中的身体级表示,这限制了概括以捕获细粒标签空间中细微的视觉差异。为了克服上述局限性,我们提出了一种基于单发骨架的动作识别的部分感知的原型代表。我们的方法捕获了两个独特的空间级别的骨架运动模式,一种用于所有身体关节的全球环境,称为身体水平,另一个则参与了身体部位的局部空间区域,称为零件水平。我们还设计了一种类不足的注意机制,以突出每个动作类别的重要部分。具体而言,我们开发了一个由三个模块组成的零件感知原型图网络:我们的双层建模的级联嵌入模块,一个基于注意力的零件融合模块,用于融合零件并生成零件感知的原型,以及可以执行匹配的模块。与部分意识表示的分类。我们证明了我们方法对两个基于公共骨架的动作识别数据集的有效性:NTU RGB+D 120和NW-UCLA。
translated by 谷歌翻译
采样是原始点云数据处理的重要组成部分,例如在流行的PointNet ++方案中。最远的点采样(FPS)是最流行的采样方案之一,最远的点采样(FPS)是最远的点并执行距离更新。不幸的是,它的效率低,并且可能成为点云应用的瓶颈。我们提出了由M参数化的可调节FPS(AFP),以积极地降低FPS的复杂性,而不会损害采样性能。具体而言,它将原始点云分为M小点云,并同时将样品M点分为M点。它利用了大约分类点云数据的尺寸局部性,以最大程度地减少其性能降解。 AFPS方法可以在原始FPS上实现22至30倍的速度。此外,我们提出了最近的点距离级别(NPDU)方法,以将距离更新数限制为常数数字。 AFPS方法上的NPDU组合可以在具有2K-32K点的点云上实现34-280X的加速,其算法性能与原始FPS相当。例如,对于Shapenet部件分割任务,它可以达到0.8490实例平均MIOU(联合平均交叉点),与原始FPS相比,它仅下降0.0035。
translated by 谷歌翻译